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The tomato is one of the most popular and well-liked veggies among Asians. 
It is interesting to note that in Bangladesh, it is the second most significant 
vegetable consumed. Moreover, tomato is served not only as a vegetable, but 
it is also served as sauce, jam, etc., and used in making different types of 
cuisines. But the fact is due to the pests, thousands of tons of tomatoes are 
harmed every year in Bangladesh. The production of tomatoes in 
Bangladesh is harmed by a number of dangerous pests. We develop a 
solution to recognize pests at an early stage. Five different pest types, 
including aphids, red spider mites, whiteflies, looper caterpillars, and thrips, 
have been studied in this research. To identify tomato pests, we curated 
image datasets from online and offline repositories and processed them 
using a convolutional neural network (CNN) model. We used features from 
CNN layers for three machine learning algorithms: Random Forest (RF), 
support vector machine (SVM), and K-Nearest Neighbors (K-NN). This 


comprehensive approach allowed a thorough comparison of these algorithms 
in tomato pest recognition. For recognizing tomato pests, our methods 
generate excellent results. The accuracy of our experiment is 95.49% which 
indicates the successful completion of the experiment. 
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1. INTRODUCTION 

Interestingly, tomatoes are called LOVE APPLE which originated in South America. It's an 
herbaceous annual plant of the ‘Solanaceae’ family. Tomato plants can grow 0.7-2 meters high. It produces 
yellow flowers of 3-12 cymes and is a round fruit of different colors (red, orange, pink, purple, brown, and 
yellow). The color of this plant differs in various countries. Normally tomatoes grow at 21°-24°C 
temperature with 5.5.6.8 of soil pH. Tomatoes can have various types of diseases like other plants. Fungal is 
one of them. Typical symptoms of both fungal infections and light green or yellowish spots on leaves 
indicate insect infestations, often growing larger and causing discoloration. In humid environments, the lower 
leaf surfaces can become coated with a gray, velvety growth caused by the spores produced by the fungal 
infection. 

One of the most well-liked and important crops in the world is considered to be the tomato [1]. An 
estimated 188 million tons of tomatoes were produced globally in 2018. However, because of the substantial 
increase in consumption, notably in China and India, there is now an even higher need for this adaptable fruit 
worldwide [2], [3]. Tomatoes’ beneficial benefits are complemented by their exceptionally helpful 
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composition, which is explained by their extraordinarily high antioxidant concentration. Lycopene (nearly 
80%) is the most abundant antioxidant in tomatoes [4]. The unique organoleptic, nutritional, and 
compositional qualities of tomatoes make them essential food ingredients of remarkable gourmet and 
industrial relevance. Different tomato-based products are frequently suspected of containing various types of 
food adulteration, which has a serious negative impact on the economy and occasionally even health [5]. 

Tomato crops in home gardens face damage from various pests, including insects, nematodes, and 
mites. Nematodes, russet mites, and budworms are particularly destructive. Effective pest management 
strategies are crucial to protect crops throughout growth stages. Aphids, Whiteflies, Thrips, Looper 
caterpillars, and red spider mites are the most common pests detected in tomatoes. Tomato plants use a 
defense mechanism, emitting chemicals like methyl jasmonate, which attract pests and alert nearby plants. 
When they sense these chemicals, they collaborate to produce deterrent compounds, protecting plants from 
potential harm and agricultural applications. Tomato is a food that is enjoyed all over the world, and the 
major output of industrial tomato cultivars is its concentrated paste [6]. Maintaining healthy soil, consistent 
moisture, and proper nutrition is crucial for home-grown tomatoes, enhancing their resilience against pests 
and diseases [7]. Pest and diseases are both very common factors for gardening though tomatoes grow faster, 
those things are challenging for growing domatium pest-attack plants. Pest and diseases both are very 
common factors for gardening though tomato grows faster. Those things are challenging for growing 
domatium pest attacks plants. 

To increase the profit from the yield and growth of plants, plant diseases must be identified. Plant 
disease monitoring by hand will not consistently produce correct results. Finding subject matter specialists 
for tracking plant diseases by pests is also very difficult and expensive for farmers. Numerous artificial 
intelligence strategies are currently being developed to automatically detect and diagnose plant diseases with 
minimal human effort. Recently developed artificial intelligence methods for computer vision and natural 
language processing include convolutional neural networks (CNN) [8]. Using efficient image recognition 
technology may enhance image identification efficiency, minimize costs, and improve recognition accuracy. 
As a result, specialists and researchers both at home and abroad have conducted extensive studies, with deep 
learning serving as the primary emphasis. Deep learning technology's emergence provides substantial 
technological support for picture recognition. The CNN is a widely used deep learning model. The diseases 
and pests detection approach based on CNN can automatically extract features from the original image, 
eliminating the subjectivity and limitations of artificial feature extraction in existing methods. In our work, 
we studied to detect tomatoes early using deep learning algorithms. 

According to Vatti [1], the agriculture industry is embracing scouting robots, with major 
corporations investing in Al-powered solutions to reduce human labor dependency. Researchers used CNN 
models like VGG16, VGG19, Xception, ResNet50, and Inception V3 to analyze a dataset and evaluate their 
effectiveness using various metrics. This research aims to reduce the need for human labor in harvesting. A 
CNN model achieves a maximum classification accuracy of 0.95. Vitalis et al. [2] employed both 
conventional (soluble solid content and consistency) and advanced analytical techniques to identify and 
predict extremely low levels of adulterants present in tomato paste, including substances like paprika seed 
and corn starch (at concentrations of 0.5%, 1%, 2%, and 5%), as well as sucrose and salt (at concentrations of 
0.5%, 1%, 2%, and 5%). They applied traditional methods to the data obtained through conventional 
techniques and conducted univariate statistical analysis (ANOVA). In contrast, for the data generated through 
advanced analytical methods like NIR spectroscopy and e-tongue, they adopted multivariate approaches such 
as principal component analysis (PCA), linear discriminant analysis (LDA), and partial least squares 
regression (PLSR) to assess and interpret the results. Simeone et al. [3] show a neural network regression 
model to predict surface fouling quantity. The study found that three different food fouling materials were 
effectively cleaned using different cleaning methods. The models achieved 98% accuracy in forecasting 
fouling area and 97% accuracy in fouling volume. This research highlights the practical application of 
sensors and machine learning techniques in monitoring and optimizing cleaning procedures. Velioğlu et al. 
[4] implemented a SUB-adaptive neuro-fuzzy inference system (MLA-ANFIS) approach that was applied to 
a dataset of seven tomato images from a farm. Deep stacked sparse auto-encoders (DSSAEs) were used to 
assess tomato quality, achieving an impressive 95.5% accuracy rate and proving its effectiveness in tomato 
quality assessment. The DSSAE method surpassed previous techniques in accuracy and originality. N et al. 
[9] study to assist farmers in identifying the illness and preventing it in its early stages. They work with CNN 
because it extracts a large number of alternatives from picture datasets rather than alternative classification 
techniques. The trained model has a 97% associate degree accuracy. Liu and Wang [10] investigated how to 
build a dataset of tomato diseases and pests in a real-world environment, optimize the feature layer of the 
Yolo V3 model by using an image pyramid to achieve multi-scale feature detection, improve the detection 
accuracy and speed of the Yolo V3 model, and accurately and quickly detect the location and category of 
tomato diseases and pests. The above research breaks through the essential technology of tomato pest image 


Bulletin of Electr Eng & Inf, Vol. 13, No. 1, February 2024: 619-627 


Bulletin of Electr Eng & Inf ISSN: 2302-9285 Oo 621 


identification in natural environments, providing a reference for intelligent recognition and engineering 
application of plant diseases and pest detection. Yusiong [11] show a CNN-ELM model that automates 
tomato maturity grading, combining CNN's feature learning and ELM's computational efficiency, achieving 
96.67% classification accuracy and 96.67% F1-score. This model is promising for robust and accurate tomato 
maturity grading in agricultural applications. Aykas et al. [12] used a field-deployable portable infrared 
spectrometer for tomato paste detection. 1843 samples were collected from 2015-2019 in California, USA 
from four different leading tomato paste processors. A multi-layer architecture was utilized, incorporating the 
SUB-adaptive neuro-fuzzy inference system (MLA-ANFIS) method, neural networks, regression, and 
extreme learning machines (ELMs), to compile a dataset of tomato images obtained from a farm. The 
sensitivity, specificity, g-mean, and accuracy of the DSSAEs technique are 83.2%, 96.50%, 89.40%, and 
95.5% [13]. According to Al-Asheh et al. [14], the effects of initial solid concentration, voltage, and current 
on the amount of water removal were studied. The measured experimental data can be used to compute the 
energy of dewatering. The neural network modeling method used to represent the experimental data 
accurately and sufficiently describes the data. Xie et al. [15] predict tomato freshness and evaluate the 
practicality of infrared thermal imaging in tomato freshness prediction. A 70% training dataset, 15% 
validation, and 15% testing dataset were used. The ANN model showed nearly 90% prediction accuracy, 
indicating infrared thermal imaging's feasibility and effectiveness in quality assessment and agricultural 
applications. To properly identify and count tomatoes at various growth stages, Fawzia et al. [16] proposed a 
ground-breaking method for accurate tomato counting that combines deep instance segmentation, data 
synthesis, and color analysis. A mask R-CNN neural network was trained using artificial data, and a color- 
based thresholding technique was applied to determine each tomato's growth stage. The experiment 
demonstrated accurate tomato counting at three distinct stages: green, half-ripe, and fully ripe, showcasing its 
potential for precise tomato analysis and counting across different developmental stages. This paper follows 
the structural process as follows: section 1 is introduction, section 2 is method, section 3 is results and 
discussion, and section 4 is conclusion. 


2. METHOD 

This section of our research focused on the approach we proposed in our study that describes how to 
recognize tomato pests. The use of high-throughput technology in the field of biology has produced 
enormous amounts of data. Now, computational biology's main difficulty is turning these enormous volumes 
of data into knowledge. Deep learning algorithms are currently promoting the use of machine learning in 
several fields of biology, including plant virology [17]. Production losses from different plant diseases can be 
prevented by remaining vigilant. Botanists and agriculture professionals must manually monitor plant 
diseases, which is time-consuming, difficult, and prone to mistakes. Machine vision technology can be very 
helpful in lowering the risk of illness severity [18]. Data collection is the first step, then preprocessing and 
data augmentation are done gradually. Then, using CNN, develop a model, and then compare it to supervised 
machine learning techniques such as K-Nearest Neighbor (K-NN), support vector machine (SVM), and 
Random Forest (RF). Our data is gathered from both online and offline sources, and it is divided into two 
categories: training and testing. We have used 80% for the training and 20% of the data for testing. In 
Figure 1, we have demonstrated the working process that we have followed for our research to reach the goal 
of our study starting from the data collection to the result of the study. 


“oe on e 


Figure 1. Workflow diagram 
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2.1. Data preprocessing 

Processing is transforming unprocessed data into a format that computer programs and machine 
learning algorithms can understand and evaluate. Before being used as an input to the CNN, the raw picture 
data for image classification needs to be preprocessed [19]. The quality of the preprocessed data has a big 
impact on the final result's quality. Any research study must include the vital process of data cleansing. We 
removed images that were out of focus, redundant, and irrelevant from our data collection. 60% of our 
dataset is collected from online platforms and 40% from offline sources. 
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2.2. Data augmentation 

To artificially increase the amount of data, which is necessary for deep learning networks to 
function well, data augmentation entails creating new data points from the already existing ones [20]. A 
machine learning model's parameters are adjusted or optimized during training so that it can convert a given 
input, such as an image, into a certain intended output. We have made augmentation to our dataset by using 
the flip technique and rotation techniques (such as 45, 90, 180, and 270 degrees). Additionally, we have 
modified the dataset by enhancing the brightness and darkness of the data. Moreover, images were cropped, 
sharpened, and blurred. Thus, 11,424 datasets were generated. Data points are prepared for the next processes 
as a result. 


2.3. Implementation process 

In our research, we worked with 40x40x3 images featuring five different pests: Aphids, Looper 
caterpillars, Thrips, Whitefly, and red spider mites. Our proposed model employs a CNN for feature 
extraction, consisting of the following key layers: an initial convolutional layer with 96 filters and a 3x3 
kernel size, followed by a pooling layer. A subsequent convolutional layer with 256 filters, a 3x3 kernel size, 
and activation functions, followed by another pooling layer. A secondary convolutional layer with 384 filters, 
utilizing the same padding, a 3x3 kernel size, and ReLU activation, followed by pooling with a 2x2 window. 
An additional convolutional layer with 522 filters, employing the same padding, a 3x3 kernel size, and ReLU 
activation, followed by another 2x2 pooling layer. The output layer comprises 5 neurons, using softmax 
activation, after a flattened layer. Further downstream, a dense layer with 64 neurons, each employing ReLU 
activation, is included in the architecture. The optimization process uses a sparse categorical cross-entropy 
loss function and Adam optimizer with a 0.001 learning rate. This model is tailored to efficiently extract 
features from the input images and make predictions about the five pest classes. Well, our CNN model 
follows the formula which is: 


F G, D= * K) (i, )SÈX2I (i + m, j + n) K (m, n) 


where F denotes the output, K denotes the filter where size id m*n and the operation are denoted by I*K. 
Moreover, we have used the activated function Relu which can be expressed as f(x)=max (0, x) to increase 
the nonlinearity. Our model uses max pooling, a commonly used technique for identifying and retaining the 
highest value in a specified input area. In this study, we have proposed our CNN model which is presented in 
Figure 2. Figure 2 shows the layers, filters, kernel size, and parameters with the output. 


Image Con 2D Max Pooling Con 2D Max Pooling Con 2D Max Pooling Con 2D Max Pooling 
Layer Layer Layer Layer 
40x40x3 Filter=96 Kernel2x2 Filter=256 Kernel2x2 Filter=384 Kernel2x2 Filter=512 Kernel2x2 
Kernel3x3 Kernel3x3 Kernel3x3 Kernel3x3 Flatten Layer Dense Layer Output 


2048 Neurons 64 Neurons Layer 


Figure 2. CNN model 


2.3.1. The summary of the convolutional neural network model 

In Table 1, the layers of the proposed model are presented with output shape and parameters. total 
parameter 3,010,693, trainable parameter 3,010,693, and non-parameter 0. A summary of the proposed model 
is shown in Table 1. 


2.3.2. Performance evaluation matrix 

To measure the performance matrix as shown in Figure 3 is used where TP denotes true positive, TF 
true negative, FN for false negative and, FP for false positive. In Figure 3, a confusion matrix is presented 
where we can see the performance of RF in Figure 3(a) along with SVM in Figure 3(b), K-NN in Figure 3(c), 
and CNN in Figure 3(d) which is used to evaluate the performance and comparison. Subfigures illustrate the 
visual representation. 

In the method, 80% of the data is trained. On the 20% of the dataset that is yet untested, tests are 
run. In order to observe the resulting graph and produce the output since it would look to be overfitted 
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otherwise, we then ran 25 epochs. So, it is possible to calculate the likelihood for each class. Comparing 
K-NN, RF classification, and SVM classifier, we found RF's versatility in handling continuous and 
categorical variables in regression and classification tasks. SVM can be used to address classification or 
regression problems. Additionally, regression and classification are both done using RF. While RF achieves 
94.09% accuracy, SVM achieves 93.34% accuracy, and K-NN achieves 94.09% accuracy, CNN achieves 


95.49 percent accuracy. 


Table 1. Layers of the proposed CNN model 


Layer (type) Output Shape Parameter 
conv2d (Conv 2D) (None, 38, 38, 96) 2688 
max_pooling2d (MaxPolling2D) (None, 19, 19, 256) 0 
conv2d_1 (Conv 2D) (None, 19, 19, 256) 221440 
max_pooling2d_1 (MaxPolling2D) (None, 9, 9, 256) 0 
conv2d_2 (Conv 2D) (None, 9, 9, 384) 885120 
max_pooling2d_2 (MaxPolling2D) (None, 4, 4, 384) 0 
conv2d_3 (Conv 2D) (None, 2, 2, 512) 1769984 
max_pooling2d_3 (MaxPolling2D) (None, 2, 2, 512) 0 
flatten (flatten) (None, 2048) 0 
dense (Dense) (None, 64) 131136 
dense_1 (Dense) (None, 5) 325 
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Figure 3. Comparison of the confusion matrix; (a) RF, (b) SVM, (c) K-NN, and (d) SNN 
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3. RESULTS AND DISCUSSION 
3.1. Result 

Images of five tomato pests are used in our study. One of the most effective picture categorization 
methods is the CNN [21], [22]. We have used CNN to evaluate the performance. In the comparison of 
results, we also employed the features extracted from the convolutional layer as inputs for SVM, RF, and K- 
NN. Though they all present good performances, CNN presents the best result. Table 1 shows the results of 
the performance that is measured. Here, we can see the highest score is presented by CNN compared to other 
machine learning algorithms. From the Table 2, we can see the difference between the results. Performance is 
measured using precision, recall, Fl-score, and support. Accuracy plays an important role for the 
performance analysis. In Table 3, classification performance is analyzed, and the comparison between 
models are presented. Model accuracy is presented in the Figure 4 using a line graph. Figure 4 likely shows 
how the model's accuracy on the training dataset compares to its test accuracy over the course of 25 epochs. 


Table 2. Performance analysis of CNN and other algorithms 


Model Metrics score ; R Pests ; : : 
Aphids Looper caterpillars Thrips Whitefly Red spider mite 
CNN Precision 0.98 0.95 0.94 0.89 1.00 
Recall 0.87 0.95 0.95 0.99 0.99 
Fl 0.92 0.95 0.94 0.93 0.99 
Support 461 487 493 455 389 
Convolution layer + RF Precision 0.93 0.92 0.93 0.94 1.000 
Recall 0.89 0.92 0.92 0.96 0.97 
Fl 0.91 0.94 0.93 0.95 0.99 
Support 461 487 493 455 389 
Convolution layer +SVM Precision 0.90 0.95 0.88 0.95 1.000 
Recall 0.90 0.93 0.95 0.93 0.97 
F1 0.90 0.94 0.91 0.94 0.98 
Support 461 487 493 455 389 
Convolution layer +K-NN Precision 0.92 0.96 0.89 0.95 1.000 
Recall 0.90 0.93 0.96 0.96 0.97 
Fl 0.91 0.94 0.92 0.95 0.99 
Support 461 487 493 455 389 


Table 3. Classification performance of CNN and other algorithms 


CNN (%) Convolution layer + RF (%) Convolution layer + SVM (%) Convolution layer + K-NN (%) 


Accuracy 95.49 94.09 93.34 94.09 
Precision (avg) 95.2 94.4 93.6 94.4 
Specificity (avg) 95.0 93.8 92.8 93.4 
Sensitivity (avg) 95.0 93.7 93.4 94.4 
G-mean (avg) 92.1 91.6 92.0 92.0 
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Figure 4. Training accuracy vs validation accuracy 
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3.2. Discussion 

Our proposed model shows excellent performance. According to the evaluation metrics, Table 2 
compares the classification performance of CNN and convolution with other machine-learning techniques. 
Moreover, one would want to be able to identify tomato plants in addition to plot-based comparisons. This 
would also solve the assignment issue, as each plant could be readily coded by a label at the pot, and 
tomatoes could be distinguished by how far away from the pot they were on the stem [23]-[25]. The best 
accuracy is 95.49% for CNN, whereas the corresponding figures for convolution layer + RF, convolution 
layer + KNN, and convolution layer + SVM are 94.0%, 93.09%, and 93.34%. Additionally, CNN achieves 
the highest sensitivity, specificity, accuracy, and g-mean of 92.1%, 95.0%, 95.0%, and 95.02%, respectively. 
In this work, our approach is successful at recognizing pests. 


4. CONCLUSION 

The development of agriculture contributes to the success of the national economy. Bangladesh is a 
country that mostly depends on agriculture. One of the important vegetables in Bangladesh is the tomato. The 
majority of people would consider tomatoes as a sort of vegetable, although some individuals also consider 
them to be fruits. For the nine years from 2012 to 2021, the average annual rate of tomato consumption 
increased by 6.8%. The percentage of tomatoes is increasing every year. Almost all types of cuisine require 
tomatoes. Moreover, Bangladesh exports tomatoes annually to several countries. Pests hampered tomato 
production, leading farmers to suffer significant losses and affecting the nation's economy. Early detection 
and treatment of plant diseases dramatically reduce production losses. The use of image-based automatic 
plant disease identification (APDI) systems in pest management tactics has begun to spread. CNN are used to 
extract target areas from images, segment objects, and determine the quantity and kind of pests on leaves, 
fruits, and vegetables. Image recognition is mostly utilized in training neural network models to classify 
categories. Our proposed approach to solving this issue produces excellent results regarding recognizing 
pests. In this study, five types of pests are examined by gathering image data sets from both online and 
offline platforms. We got the desired result using 80% training data and 20% test data set. Comparing CNN 
to other machine learning algorithms has shown that it can regularly produce impressive results. The study's 
findings show that our CNN model has a 95.49% accuracy rate for recognizing pests. CNN offers fresh 
approaches and concepts as well as a solid technological foundation. Mankind would greatly benefit from 
this approach. 
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